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Abstract 

On 

In this paper we present distributed and adaptive algorithms for motion coordination of a group of m autonomous 
CNI vehicles. The vehicles operate in a convex environment with bounded velocity and must service demands whose time 

of arrival, location and on-site service are stochastic; the objective is to minimize the expected system time (wait plus 
service) of the demands. The general problem is known as the m-vehicle Dynamic Traveling Repairman Problem 
(m-DTRP). The best previously known control algorithms rely on centralized a-priori task assignment and are not 

o 

robust against changes in the environment, e.g. changes in load conditions; therefore, they are of limited applicability 
in scenarios involving ad-hoc networks of autonomous vehicles operating in a time-varying environment. 

First, we present a new class of policies for the 1-DTRP problem that: (i) are provably optimal both in light- 
and heavy-load condition, and (ii) are adaptive, in particular, they are robust against changes in load conditions. 
r/2 Second, we show that partitioning policies, whereby the environment is partitioned among the vehicles and each 

i ^ i vehicle follows a certain set of rules in its own region, are optimal in heavy-load conditions. Finally, by combining 

the new class of algorithms for the 1-DTRP with suitable partitioning policies, we design distributed algorithms for 
the m-DTRP problem that (i) are spatially distributed, scalable to large networks, and adaptive to network changes, 
(ii) are within a constant-factor of optimal in heavy-load conditions and stabilize the system in any load condition. 
Simulation results are presented and discussed. 

rn 

CO 

Q I. INTRODUCTION 

OS 

Advances in computer technology and wireless communications, coupled with new perceived critical needs of 
our society, motivate the rapidly increasing interest in the design and deployment of large networks of autonomous 
mobile vehicles capable of interacting directly with the environment [1], Potential applications are numerous 
and include surveillance and exploration missions, where vehicles are required to visit points of interest, and 
transportation and distribution tasks, where vehicles are required to provide service or goods to people. In this 
context, one of the major scientific challenges is the design of possibly distributed coordination policies to efficiently 
assign vehicles to demands (or targets) that are spatially dispersed over the environment. 

In the recent past, considerable efforts have been devoted to the problem of how to assign, cooperatively, vehicles 
to demands [2]-[7]. In these papers, the underlying mathematical model is static and fits within the framework of 
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the Vehicle Routing Problem (see [8] for a thoroughly introduction to the VRP), whereby: (i) a team of m vehicles 
is required to service a set of n demands in a 2-dimensional space; (ii) associated with a demand there is an on-site 
service time; (iii) the goal is to compute a set of routes that optimizes the cost of servicing (according to some 
quality of service metric) the demands. The VRP is static in the sense that vehicles' routes are computed assuming 
that no new demands arrive. 

Despite its richness, inherent elegance and frequent occurrence in practical problems, the VRP is often a static 
approximation to problems that are in reality dynamic, and therefore it fails to be an appropriate model in many 
applications of interest. In fact, demands often arrive sequentially in time, and planning algorithms should actually 
provide policies (in contrast to pre-planned routes) that prescribe how the routes should evolve as a function of those 
inputs that evolve in real-time. From a methodological point of view, dynamic demands (i.e., demands that vary 
over time) add queueing phenomena to the combinatorial nature of the VRP. As a motivating example, consider 
a surveillance mission where a team of Unmanned Aerial Vehicles (UAVs) must ensure continued coverage of a 
certain area; as events occur, i.e., as new targets are detected by on-board sensors or other assets (e.g., intelligence, 
high-altitude or orbiting platforms, etc.), UAVs must proceed to the location of the new event and provide close- 
range information about the target. Then, the UAV mission control is a continuous process of detecting targets, 
planning routes and sending UAVs. In such a dynamic setting, stability of the overall system is an additional issue 
to be addressed. 

In the present work, we wish to address the assignment problem in the more general framework where new 
demands for service are generated over time by a stochastic process. In [9], the authors present a setting where 
demands arrive sequentially over time and routes are generated in real-time, however the optimization problem 
is over a finite horizon and the analysis only addresses a number of vehicle m < 2. Arguably, the most general 
model for vehicle routing problems that have both a dynamic and a stochastic component is the m-vehicle Dynamic 
Traveling Repairman Problem (m-DTRP) [10]— [13]. In the m-DTRP, m agents operating in a convex environment 
and traveling with bounded velocity must service demands whose time of arrival, location and on-site service are 
stochastic; the objective is to find a policy to service demands over an infinite horizon that minimizes the expected 
system time (wait plus service) of the demands. The best previously known control policies for the m-DTRP rely 
on centralized task assignment and are not robust against changes in the environment, in particular changes in 
load conditions; therefore, they are of limited applicability in scenarios involving ad-hoc networks of autonomous 
vehicles operating in a time-varying environment. 

In this paper we focus on unbiased policies for the DTRP, i.e., policies that provide the same quality of service 
to all demands independently of their locations. The contribution is threefold: First, we present a new class of 
unbiased policies for the 1-DTRP problem. In particular, we propose the Divide & Conquer (DC) policy, whose 
performance depends on a design parameter r. If r — > +oo, the policy is (i) provably optimal both in light and 
heavy-load conditions, and (ii) adaptive with respect to changes in the load conditions and in the statistics of the 
on-site service requirement; if, instead, r = 1, the policy is (i) provably optimal in light-load condition and within a 
factor of 2 from optimality in heavy-load condition, and (ii) adaptive with respect to all problem data, in particular, 
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and perhaps surprisingly, it does not require any knowledge about the demand generation process. Moreover, by 
applying ideas of receding-horizon control to dynamic vehicle routing problems, we introduce the Receding Horizon 
(RH) policy, that also does not require any knowledge about the demand generation process; we will show that 
the RH policy is optimal in light-load and stable in any load condition, and we will heuristically argue that its 
performance is close to optimality in heavy-load condition. Second, we show that partitioning policies, whereby the 
environment is partitioned among the vehicles and each vehicle follows a certain set of rules in its own region, are 
optimal in heavy -load conditions. Finally, by combining our new class of algorithms for the 1-DTRP with suitable 
partitioning policies, we design distributed algorithms for the m-DTRP problem that (i) are spatially distributed, 
scalable to large networks, and adaptive to network changes, (ii) are within a constant-factor of optimal in heavy- 
load conditions and stabilize the system in any load condition. Here, by network changes we mean changes in the 
number of vehicles, in the environment boundaries, in the arrival rate of demands, and in the characterization of 
the on-site service requirement. 

The paper is structured as follows. In Section [II] we provide some background on probability theory, on the 
Euclidean Traveling Salesman Problem, and on the continuous multi-median problem. In Section [TIT] we formulate 
the problem we wish to address, and we review existing results. In Sections IV and [V] we present and analyze, 



respectively, the Divide & Conquer policy and the Receding Horizon policy. In Section |VI] building upon the two 
previous policies for the 1-DTRP, we discuss partitioning policies and we design distributed algorithms for the 



m-DTRP. In Section \VU\ we present results from numerical experiments, and finally, in Section [Vni| we draw some 
conclusions and discuss some directions for future work. 

II. Preliminaries 

In this section, we briefly describe some known concepts from probability and locational optimization, on which 
we will rely extensively later in the paper. 

A. Jensen's Inequality 

Suppose h(-) is a convex function and X a random variable, then 

E[h(X)]>h(E [X]), 

provided both expectations exist. 

B. Equitable Partitions 

Let Q C M. d be a bounded, convex set. An r-partition of Q (where r £ N, r > 1) is a collection of r closed subsets 
{Qk}k with disjoint interiors whose union is Q. Given a measurable function / : Q — > M>o with Jq f{x) dx = 1, 
an r-partition {Qk}k is equitable with respect to / if Jq f(x) dx — l/r for all k G {1, . . . , r}. Similarly, given 
two measurable functions fj : Q — > R>o, j £ {1, 2}, with j Q fj(x) dx — 1, an r-partition {Qk}k is simultaneously 
equitable with respect to /i and f<z if Jq fj(x) dx — l/r for all k £ {1, . . . , r} and j £ {1, 2}. Theorem 12 in [14] 
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and Corollary 3 in [15] show that, given two measurable functions fj'.Q—* R >0 , j E {1, 2}, with J Q fj(x) dx = 1, 
there always exists an r-partition that is simultaneously equitable with respect to fx and fa and where the subsets 
Qk are convex. 

C. Asymptotic and Worst-Case Properties of the Traveling Salesman Problem in the Euclidean Plane 

The Euclidean Traveling Salesman Problem (TSP) is formulated as follows: given a set D of n points in R d , 
find the minimum-length tour (i.e., cycle that visits all nodes exactly once) of D; the length of a tour is the sum 
of all Euclidean distances on the tour. Let TSP(£>) denote the minimum length of a tour through all the points in 
D; by convention, TSP(0) = 0. Assume that the locations of the n points are random variables independently and 
identically distributed in a compact set Q; in [16] it is shown that there exists a constant Prsp.d such that, almost 
surely, 

where / is the density of the absolutely continuous part of the distribution of the points. For the case d — 2, the 
constant /3tsp,2 has been estimated numerically as /3tsp,2 — 0.7120 ± 0.0002, [17]. 

Notice that the bound Q holds for all compact sets: the shape of the set only affects the convergence rate to 
the limit. According to [18], if Q is a "fairly compact and fairly convex" set in the plane, then Eq. ([TJ provides an 
adequate estimate of the optimal TSP tour length for values of n as low as 15. 

Finally, for a certain environment Q, the following ( deterministic) bound holds on the length of the TSP tour, 
uniformly on n [16]: 

TSP(JJ) < l3 QA \Q\ 1/d n^- 1 ^ d . (2) 

where |Q| is the Lebesgue measure of Q. The price of determinism is that the constant /3g.d depends on the shape 
of the environment and is generally larger than /?TSP,ti- 

D. Tools for solving TSPs 

The TSP is known to be NP-complete, which suggests that there is no general algorithm capable of finding the 
optimum tour in an amount of time polynomial in the size of the input. Even though the exact optimal solution of 
a large TSP can be very hard to compute, several exact and heuristic algorithms and software tools are available 
for the numerical solution of Euclidean TSPs. 

The most advanced TSP solver to date is arguably concorde [19]. Heuristic polynomial-time algorithms are 
available for constant-factor approximations of TSP solutions, among which we mention Christofides' [20]. On a 
more theoretical side, Arora proved the existence of polynomial-time approximation schemes, providing a (1 + e) 
constant-factor approximation for any s > [21]. 

A modified version of the Lin-Kernighan heuristic [22] is implemented in linkern; this powerful solver yields 
approximations in the order of 5% of the optimal tour cost very quickly for many instances. For example, in our 



March 23, 2009 



DRAFT 



5 



numerical experiments on a 2.4 GHz Pentium machine, approximations of random TSPs with 1,000 points typically 
required about two seconds of CPU timeQ 

In the following, we will present algorithms that require on-line solutions of large TSPs. Practical implementations 
of the algorithms will rely on heuristics, such as Lin-Kernighan's or Christofides'. If a constant-factor approximation 
algorithm is used, the effect on the asymptotic performance guarantees of our algorithms can be simply modeled 
as a scaling of the constant /?TSP,d- 

E. The Continuous Multi-Median Problem 

Given a set Q C M. d and a vector P — (pi, . . . ,p m ) of m distinct points in Q, the expected distance between a 
random point q, generated according to a probability density function /, and the closest point in P is given by 



where V(P, Q) — (Vi(P, <2), . . . , V m (P, Q)) is the Voronoi partition of the set Q generated by the points P. In 
other words, q £ Vfc(P, Q) if \\q — Pk\\ < ||<7 — Pj\\ , for all j S {1, . . . , m}. The set Vt is referred to as the Voronoi 
cell of the generator pk ■ The function H m is known in the locational optimization literature as the continuous Weber 
function or the continuous multi-median function; see [23], [24] and references therein. 

The m-median of the set Q, with respect to the measure induced by /, is the global minimizer 



We let H* n (Q) = H m (P^(Q),Q) be the global minimum of H m . It is straightforward to show that the map 
P i— > Hi(P, Q) is differentiable and strictly convex on Q. Therefore, it is a simple computational task to compute 
P\(Q)- It is convenient to refer to P* (Q) as the median of Q. On the other hand, the map P i— > H m (P, Q) is 
differentiable (whenever {pi,... ,p m ) are distinct) but not convex, thus making the solution of the continuous m- 
median problem hard in the general case. It is known [23], [25] that the discrete version of the m-median problem 
is NP-hard for d > 2. The set of critical points of H m contains all configurations {pi, . . . ,p m ) with the property 
that each point pk is the generator of the Voronoi cell Vfe(P, Q) as well as the median of Vfc(P, Q) (we refer to 
such Voronoi diagrams as median Voronoi diagrams). 



In this section we formulate the problem we wish to address and we review existing results. 

'Both concorde and linkern are written in ANSI C and are freely available for academic research use at 
http : //www .math . princeton . edu/tsp/ Concorde . html 
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argminP^P, Q). 
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III. Problem Formulation and Existing Results 
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A. Problem Formulation 

The basic version of the problem we wish to study in this paper is known as the Dynamic Traveling Repairman 
Problem (DTRP), which was introduced by Psaraftis in [10] and mainly studied by Bertsimas and van Ryzin in 
[1 1]— [13]. In this section, we define the DTRP problem and we review existing results. 

Let the workspace Q C R d be a bounded, convex set, and let || • || denote the Euclidean norm in R d . We define the 
diameter of Q as: diam(Q) = sup{||p — q\\ \p,q e Q}. For simplicity, we will only consider the planar case, i.e., 
d= 2, with the understanding that extensions to higher dimensions are possible. Consider m holonomic vehicles, 
modeled as point masses, and let 



describe the locations of the vehicles at time t. The vehicles are free to move, traveling at a constant velocity v, 
within the environment Q. The vehicles are identical, and have unlimited fuel and demand servicing capacity. 

Demands are generated according to a homogeneous (i.e., time-invariant) spatio-temporal Poisson process, with 
time intensity A, and spatial density / : Q — > M>o- In other words, demands arrive to Q according to a Poisson 
process with intensity A, and their locations {Xi, i 1} are i.i.d. (i.e., independent and identically distributed) 
and distributed according to a density /(•) whose support is Q. A demand's location becomes known (is realized) 
at the demand's arrival epoch; thus, at time t we know with certainty the locations of demands that arrived prior 
to time t, but future demand locations form an i.i.d sequence. The notation f(x) is short for f(x\,X2) (x = [x±, 
X2]). The density /(•) satisfies: 



We assume that the number of initial outstanding demands is a random variable with finite first and second moments. 

At each demand location, vehicles spend some time s in on-site service that is i.i.d. and generally distributed 
with finite first and second moments denoted by s and s 2 . A realized demand is removed from the system after 
one of the vehicles has completed its on-site service. We define the load factor g = As/to. 

The system time of demand j, denoted by Tj, is defined as the elapsed time between the arrival of demand j and 
the time one of the vehicles completes its service. The waiting time of demand j, Wj, is defined by Wj = Tj — Sj. 
The steady-state system time is defined by T = limsupj^^ E [Tj]. A policy for routing the vehicles is said to be 
stable if the expected number of demands in the system is bounded uniformly at all times. A necessary condition 
for the existence of a stable policy is that g < 1; we shall assume g < 1 throughout the paper. When we refer 
to light-load condition, we consider the case g — > + , in the sense that A — > + ; when we refer to heavy-load 
condition, we consider the case g — > 1~, in the sense that A — > [l/s]~. 

Let V be the set of all causal, stable and stationary routing policies. Letting T n , denote the system time of a 
particular policy n e V, the DTRP is then defined as the problem of finding a policy tt* (if one exists) such that 




\/S C Q, and 




= inf T 7 



We let T* denote the infimum in the right hand side above. 
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We finally need two definitions; in these definitions, X is the location of a randomly chosen demand and W is 
its waiting time. 

Definition 3.1: A policy tt is called spatially unbiased if for every pair of sets 6>i, S2 Q Q 

E[W\X € S 1 ]=E[W\X € S 2 }; 
a policy tt is instead called spatially biased if there exists sets Si, 52 C Q such that 

E[W|x e Si] > E[w \x e S 2 \. 

In this paper we focus on unbiased policies; indeed, unbiased service seems to be a natural constraint in most 
applications of interest, e.g., surveillance by a team of UAVs. We are interested in distributed, scalable, adaptive 
control policies with provable performance guarantees, that rely only on local exchange of information between 
neighboring vehicles, and do not require explicit knowledge of the global structure of the network. 

B. Existing Results for the m-DTRP 

As in many queueing problems, the analysis of the to-DTRP problem for all values of q S [0, 1) is difficult. 
In [1 1] — [13], [26], lower bounds for the optimal steady-state system time are derived for the light-load case (i.e., 
q — > + ), and for the heavy-load case (i.e., q — > 1~). Subsequently, policies are designed for these two limiting 
regimes, and their performance is compared to the lower bounds. 

1) Lower Bounds: For the light-load case, a tight lower bound on the system time is derived in [12]. In the 
light-load case, the lower bound on the m-DTRP system time is strongly related to the solution of the m-median 
problem: 

f * > -H* n {Q) + s, as q^0+. (3) 

v 

This bound is tight: there exist policies whose system times, in the limit q — > + , attain this bound; we present 
such asymptotically optimal policies for the light-load case below. 

The following two lower bounds hold in the heavy-load case [13], [26]. Within the class of spatially unbiased 
policies the optimal system time is lower bounded by 



2 A 



j* > /^TSP,2 



Lf 1 '\x)dx 



2 



m 2 v 2 (1 — q) 2 

Within the class of spatially biased policies the optimal system time is lower bounded by 



e ->i- (4) 



T * > .^TSP,2 



as q — * 1 . (5) 



2 m 2 v 2 (1 — q) 2 

Both bounds Q and |5]) are tight: there exist policies whose system times, in the limit q — > 1~, attain these bounds; 
therefore the inequalities in Q and <|3j should indeed be replaced by equalities. We present asymptotically optimal 
policies for the heavy-load case below. It is shown in [13] that the lower bound in Eq. <j3j is always lower than or 
equal to the lower bound in Eq. Q for all densities /. 
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Remark 3.2: In equations Q and (|5J, the right-hand side approaches +00 as g — > 1~. Thus, one should more 
formally write the inequalities with f*(l — p) 2 on the left-hand side, so that the right-hand side is finite. However, 
this makes the presentation less readable, and thus, in the remainder of the paper we adhere to the less formal but 



DTRP (see, e.g., [13] Section 4 and [27] Proposition 2). 

Remark 3.3: It is possible to show (see [13], Proposition 1) that a uniform spatial density function is the worst 
possible and that any deviation from uniformity in the demand distribution will strictly lower the optimal mean 
system time in both the unbiased and biased case. In addition, allowing biased service will result in a strict reduction 
of the optimal mean system time for any non-uniform density /. Also, when the density is uniform there is nothing 
to be gained by not providing unbiased service. 

2) Optimal Policies for the Light Load Case: For the light-load case, to the best of our knowledge, there is only 
one policy that is known to be optimal; this policy is as follows. 

The m Stochastic Queue Median (SQM) Policy [12] — Locate one vehicle at each of the m median 
locations for the environment Q. When demands arrive, assign them to the nearest median location and 
its corresponding vehicle. Have each vehicle service its respective demands in First-Come, First-Served 
(FCFS) order returning to its median after each service is completed. 
This policy, although optimal in light load, has several drawbacks: (i) it does not stabilize the system as the load 
increases, (ii) it relies on the solution of the continuous rri-median problem that is hard for m > 1, (iii) it relies 
on a priori computation and, thus, is not adaptive to changes in the environment, and, finally, (iv) it requires a 
centralized assignment to the median locations. 

3) Optimal Policies for the Heavy Load Case: For the heavy-load case, to the best of our knowledge, there is 
only one unbiased policy that has been proven to achieve the lower bound in Q; this policy is as follows. 

The Unbiased TSP (UTSP) Policy [13] — Let r be a fixed positive, large integer. From a central point in 
the interior of Q, subdivide the service region into r wedges Q%, Q2, ■ ■ ■ , Q r such that J Q f(x)dx = 1/r, 
i = 1,2, ■ ■ ■ ,r. Within each subregion, form sets of size n/r (n is a design parameter). As sets are formed, 
deposit them in a queue and service them FCFS with the first available vehicle by forming a TSP on the 
set and following it in an arbitrary direction. Optimize over n. 
It is possible to show that 



Thus, letting r — ► 00, the lower bound in Q is achieved. 

The same paper [13] presents an optimal biased policy. This policy, called Biased TSP (BTSP) Policy, relies on 
an even finer partition of the environment and requires / to be piecewise constant. 

Although both policies are optimal within their respective classes, they have the following drawbacks: (i) they 
do not perform well in light load, (ii) the parameter n is selected a priori and depends on the data of the problem, 



more transparent style of equations Q and Q. Furthermore, this is the style generally used in the literature on the 




as 
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(iii) they require a centralized data structure (the demands' queue is shared by the vehicles), (iv) the partition of 
the environment is computed a priori and clearly depends of /. 

C. Toward Distributed, Scalable, and Adaptive Control Policies for the m-DTRP 

A candidate distributed, scalable, and adaptive control policy for the m-DTRP is the simple Nearest Neighbor 
(NN) policy: at each service completion epoch, each vehicle chooses to visit next the closest unserviced demand, if 
any, otherwise it stops at the current position. Because of the dependencies among the travel inter-demand distances, 
the analysis of the NN policy is difficult and no rigorous results have been obtained so far [11]; in particular, its 
stability properties are not known. Simulation experiments show that the NN policy performs like a biased policy 
and is not optimal neither in the light-load case nor in the heavy-load case. In particular, when / is uniform, in 
light load the steady-state system time under the NN policy is 40% larger than the steady-state system time under 
the (optimal) SQM policy, while in heavy load the steady-state system time under the NN policy is 60% larger 
than the steady-state system time under the (optimal) Biased TSP policy [11], [13]. Therefore, the NN policy lacks 
provable performance guarantees (in particular about stability), is biased and does not seem to achieve optimal 
performance neither in light load nor in heavy load. 

The key idea that we will pursue in this paper is that of partitioning policies. 

Definition 3.4: Given a policy tt for the 1-DTRP and m vehicles, a tt -partitioning policy is a family of multi- 
vehicle policies such that 

1) the environment Q is partitioned into m openly disjoint subregions Qk, k € {1, . . . , m}, whose union is Q, 

2) one vehicle is assigned to each subregion (thus, there is a bijective correspondence between vehicles and 
subregions), 

3) each vehicle executes the single-vehicle policy tt to service demands that fall within its own subregion. 



Note that Definition 3.4 does not specify how the environment is actually partitioned, therefore Definition 3.4 
describes a family of policies (one for each partitioning strategy) for the m-DTRP. The SQM policy, which is 
optimal in light load, is indeed a partitioning policy whereby Q is partitioned according to a median Voronoi 
diagram that "minimizes" H m , and each vehicle executes inside its own Voronoi cell the single-vehicle policy 
"service FCFS and return to the median after each service completion". Moreover, in Section [VI] we will prove that 
in heavy load, given a single-vehicle, unbiased optimal policy tt* , there exists an unbiased tt* -partitioning policy 
that is optimal. 

The above discussion leads to the following strategy: First, for the 1-DTRP, we design unbiased, scalable, and 
adaptive control policies with provable performance guarantees. Then, by using recently developed distributed 
algorithms for environment partitioning, we extend these single-vehicle policies to the multi-vehicle case. 

IV. The Single-Vehicle Divide & Conquer Policy 

We describe the Divide & Conquer (DC) policy in Algorithm [T] where D is the set of outstanding demands 
and p is the current position of the vehicle. An r-partition that is simultaneously equitable with respect to f(x) 
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Algorithm 1: Divide & Conquer (DC) Policy 

Assumes: An r -partition {Qk}k=i °f Q tnat i s simultaneously equitable with respect to f(x) and 

f 1/2 (x)/J Q f 1/2 (x) dx. 

1 if D = then 

2 Let Pi be the point minimizing the sum of distances to demands serviced in the past (the initial condition 
for P* is a random point in Q); if p ^ P*, move to P*, otherwise stop. 

3 else 

4 fc <— k Q . II ko is uniformly randomly chosen in the set {l,...,r}. 

5 repeat 

6 Move to the median of subregion Qj.. 

7 Compute the TSP tour through all demands in subregion Qk- Service all demands by following the 
TSP tour, starting at a random point on the TSP tour. 

8 k <— k + 1 modulo r. 

9 until D = 

to Repeat. 



and f 1 ^ 2 (x)/ Jq f 1 ^ 2 (x) dx is indeed guaranteed to exist (see discussion on equitable partitions in Section |n|. The 
number of subregions r is a design parameter whose choice will be discussed after the analysis of the policy. One 
should note that Pj* is indeed the empirical median. Next, we characterize the DC policy. Notice that the DC policy 
is an unbiased policy. 

A. Analysis of the DC Policy in Light Load 

We first study the light-load case (i.e., q — * + ). 
Theorem 4.1 (Performance of DC policy in light load) 
optimal, that is, 

f DC {r)^T\ 

Proof: Using arguments similar to those in [27], consider generic initial conditions in Q for the vehicle's 
position and for the positions of the outstanding demand (denote the initial number of demands with n ). Under the 
DC policy, an upper bound to the time Co needed to service all of the initial demands and move to the empirical 
median is: C < (n + l) diam(Q)/^ + ]T™° i Sj. Thus, we have E [C ] < E [n ](diam(Q)/-i; + s) + diam(Q)/t). 

Given time interval Cq = t, t £ M>o, the probability that no new demand arrives during such time interval is 
P(N(Cq) = 0\Cq =t) = exp(-Xt), where N(t) is the counting variable associated to a Poisson process with rate 



: As q — > + , for every r, the DC policy is asymptotically 

as q — * + . 
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A. Hence, by applying the law of total probability and Jensen's inequality for convex functions 

/>oo />oo />oo 

P [N(C ) = 0] = / P (N(C ) = 0\C = t)f Ca {t)dt = / exp{-Xt)f Co (t)dt > exp(-A / tf Co (t)dt 

Jo Jo v Jo ' (g) 

= exp(-AE[C ]) > exp(-XE[n ](diam(Q)/v + s) - Adiam(Q)/A 
where fc is the density function of random variable Co. Since, by assumption, E [no] < oo and g — > + (i.e., 
A — > + ), we obtain P[iV(Co) = 0] — > 1~ as p — > + . As a consequence, after an initial transient, all demands 
will be generated with the vehicle at the empirical median P£, and an empty demand queue, with probability 1. 

Thus, after an initial transient, the DC policy, in light load, reduces to line [5] with the condition D — true with 
probability 1. Then, in light load, the DC policy becomes identical to the light-load routing policy presented by 
one of the authors in [28]; since such routing policy is optimal, in light load, for the 1-DTRP [28], we obtain the 
desired result (the proof in [28] basically shows that P* — > P*). ■ 

B. Analysis of the DC Policy in Heavy Load 

Next, we consider the DC policy in heavy load, i.e., when g — > 1~. The performance of the DC policy in heavy 
load is characterized by the following theorem. 

Theorem 4.2 (Performance of DC policy in heavy load): As g — > 1~, the system time for the DC policy satisfies 



Tbc(r) < ( 1 
where r is the number of subregions. 



v 2 (1 - gf 



2 

(7) 



To prove Theorem 4.2 we use techniques similar to those we developed in [29], [30]. We start the analysis with the 
following Lemma, similar to Lemma 1 in [27], characterizing the number of outstanding demands in heavy load. 

Lemma 4.3 (Number of outstanding demands in heavy load for DC policy): In heavy load (i.e., g — > 1 — ), after 
a transient, the number of demands serviced in a single tour of the vehicle in the DC policy is very large with high 
probability (i.e., the number of demands tends to +oo with probability that tends to 1, as g approaches 1~). 

Proof: Consider the case where the vehicle moves with infinite velocity (i.e., v — > +oo); then the system is 
reduced to the usual M/G/l queue. The infinite-velocity system has fewer demands waiting in queue. A known 
result on M/G/l queues [31] states that, after transients, the total number of demands, as g — * 1~, is very large 
with high probability. Thus, the number of outstanding demands in each subregion is also very large with high 
probability. In particular, after transients, the number of demands is very large with high probability at the time 
instants when the vehicle starts a new TSP tour. ■ 



Lemma 4.3 has two implications. First, since the number of demands is very large at the time instants when the 
vehicle starts a new TSP tour, we can apply Eq. ([T| to estimate the length of a TSP tour. Second, since D ^ 
with high probability, the DC policy, in heavy load, reduces to lines 5|9 of Algorithm [I] with the condition D = 



always false. 

We refer to the time instant U, i> 0, in which the vehicle starts a new TSP tour in subregion 1 as the epoch i of 
the policy; we refer to the time interval between epoch i and epoch i + 1 as the zth iteration. Let nj 5 , k £ {1, . . . , r}, 
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be the number of outstanding demands serviced in subregion k during iteration i. Finally, let C\ be the time interval 
between the time instant the vehicle starts to service demands in subregion k during iteration i and the time instant 
the vehicle starts to service demands in the same subregion k during next iteration i+1. Demands arrive in subregion 
Qk according to a Poisson process with rate A = A/r, where we use the fact that the partition {<2fc}£=i * s equitable 
with respect to /; then, we have E [Vij +1 ] = A E [C 4 fc ] . The time interval C* is the sum of three components: 

1) the time for the r travels from the end of one TSP tour to the start of next one; 

2) the time to perform r TSP tours; 

3) the time to provide on-site service while performing the r TSP tours. 



Assume that i is large enough (say, i > i) so that, according to Lemma 4.3 the number of outstanding demands 
is large. The first component is trivially upper bounded by r diam(Q)/i>. Given that a demand falls in subregion 
Qk, the conditional density for its location (whose support is Qk) is f(x)/ Jq f(x) dx; then, the expected length 
of a TSP tour through n\ demands (with a slight abuse of notation, we call such length TSP(n^)) can be upper 
bounded as 



E [TSP{n1)} < /? TSP , 2 / J r dx JE [„*] = /3 TSP , 2 ^ JQ Je [„*] (8) 




where we use Eq. ([T}, we apply Jensen's inequality for concave functions in the form E \[X < ^/E [X], and 
we exploit the fact that, by definition of the DC policy, the partition {Qk}k=i i s simultaneously equitable with 
respect to f(x) and f 1/2 (x)/ J Q f 1/2 (x) dx. Define (3 = /3 T sp,2 Sq - ', equation ^ becomes E [TSP(n^)] < 



n k i+1 = AE [Cf] < X(r diam(Q)/ V + ^ + t ^^^+3^ + 5^ <i) ' k€ {!,..., r}. 



/3y E [nf] . Finally, since on-site service times are independent of the number of outstanding demands, the expected 
on-site service requirement for demands is sE [nf]. 

Then, we obtain the following recurrence relation (where we define n 1 * = E \rq\ ): 

n r . n k—1 . r k—X 

j=k j=l j=k j = l 

(9) 

The r inequalities above describe a system of recurrence relations that allows to find an upper bound on 
limsup i „ >+oc n\. The following Lemma bounds the values to which the limits lim supj_,. +00 converge. 

Lemma 4.4 (Steady state number of demands for DC policy): In heavy load, for every set of initial conditions 
{^i}fc£{i,..., r }> the trajectories i i— > fq, k € {1, . . . , r}, resulting from Eq. Q, satisfy 



A 2 

n k = lim sup < /?tsp,2 — 

i — >+oo 



J Q f^(x)dx] 2 
r v 2 (1 — g) 2 



Proof: By Lemma 4.3 n\ tends to +oo with probability that tends to 1, as g approaches 1~; then, after a 
transient, the term rdiam(Q)/v is negligible compared to the other terms in the right hand side of Eq. (|9j, and 
therefore it can be ignored (its inclusion in the proof is conceptually straightforward, but makes the analysis more 
cumbersome). 
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Next we define two auxiliary systems, System-Y and System-Z. We define System-Y (with state y(i) 6 R r ) as 
Vk(i + 1) = A( £(sy 3 -(») + £ yy^)) + + !) + \ + !)) ) > fce{l,...,r}. (10) 

System-Y is obtained by replacing the inequality in Eq. (|9) with an equality. Pick, now, any e > 0. From Young's 
inequality 

Ja < — + ea, for all a 6 M> - 
4e 

By applying Young inequality in Eq. ( fTO) we obtain 

!/*(*+!) <A(s + /) (£>(*) + £>(* + !) J + r ^, ke{l,...,r}. (11) 



\ / \J=fe 3=1 

Next, define System-Z (with state S K r ) as 



%(i+l) = A^+e^^« i (i) + 5^2r i (i+l)^+r^, fe 6 {1, . . . ,r}, (12) 

The proof now proceeds as follows. First, we show that if < yk(i) < for all k, then 

Uk(i) < 2fc(i), for all k and for all i > i. (13) 

Second, we show that the trajectories of System-Z are bounded; this fact, together with Eq. (13) , implies that 
also trajectories of variables n\ and yk(i) are bounded. Third, and last, we will compute limsup i _ >+00 yk(i)', this 
quantity, together with Eq. ( |13) , will yield the desired result. 

Let us consider the first issue. Given i, we prove that if rq < yk{i) < Zk{i) for all k S {l,...,r}, then 
< + 1) < + 1). Indeed, it is immediate to show that, under the assumptions, it holds: nh 1 < 
yi(i + 1) < zi(i + 1); moreover, it is easy to show that, under the assumptions, and if fi\ +1 < yj(i + 1) < zj(i + 1) 
for all j < k < r, then < Vk+i(i + 1) < Zk+i(i + 1). The claim follows by induction on fc. Equation (JT3J is 
a consequence of the statement we have just proven. 

We now turn our attention to the second issue, namely boundedness of trajectories for System-Z (described in 
Eq. (flZ)). Define S = X{s + ef3/v). System-Z is a discrete-time linear system and can be rewritten (by adding and 
subtracting Zk-i(i) in the second term between parentheses in Eq. ( fl2| ), when k > 1) as 

5E-=i^W+^i£ forfc=l, 
(l + tf) z k -i(i + l) -Sz k -i(t) for fee {2,...,r}. 
By using the above equation and by simple induction arguments, it is possible to show that System-Z can be written 
in canonical form as 

* fc (i +!) = (! + J)*" 1 (sj2^) + r^)-J2 + 



z k (i + 1) 



Ave) 

3 = 1 3=1 

Hence, System-Z can be written in compact form as 



z(i + l) = (A + B(e)^z(i) + c(s), 



March 23, 2009 



DRAFT 



14 



where A £ M rxr , B(e) £ W xr and each element of B(e), say b(e)kj, has the property lim £ ^ + b(e)kj = for all 
k,j £ {1, . . . ,r}, and c(e) € ]R rxl (notice that c(e) is well defined for every e > 0). It is easy to see that matrix 
A is 



Q 



(l + Q) 



fc-i 



(l + g) 



k-j-i 



for 1 < j < k-1, 
g(l + g) k ^ 1 otherwise, 

where g = As. Notice that, since g > 0, we have |ajy| = a,kj for all k and j. Then, for each k £ {1, . . . ,r}, we 
have 

r r fe— 1 

2 |o fci | = X! «jw = ^ (i + e)^ 1 + = + ^ < 

since g = Xs/r and g = As < 1 by assumption. Therefore, the L°°-induced norm of matrix A satisfies: H-AHoo = 
max/; Y^j=i \ a kj \ < 1- Since any induced norm || • || satisfies the inequality p(A) < \\A\\, where p(A) is the spectral 
radius of A, we conclude that A £ M. rxr has eigenvalues strictly inside the unit disk (i.e., A is a stable matrix). 
Since the eigenvalues of a matrix depend continuously on the matrix entries, there exists a sufficiently small s > 
such that the matrix A+eB has eigenvalues strictly inside the unit disk. Accordingly, each solution i i— ► z(i) £ K> 
of System-Z converges exponentially fast to the unique equilibrium point 

(14) 

n(i) and i i— > y(i) are bounded 



I r — A - B{e)j c{e). 

Combining Eq. ( fl3| l with the previous statement, we see that the solutions 
(where n(i) = (nj, . . . , n r i )~). Thus 



limsupn(i) < limsupy(i) < +oo. 

i — >+oo i — >+oo 



(15) 



Finally, we turn our attention to the third issue, namely the computation of y = limsup i _^ +00 y(i). Taking the 
limsup of the left- and right-hand sides of Eq. ( flQ| , and noting that 



lim sup Jy 3 (i) = /lim sup y 3 - (z) for j £ {1, . . . , r}, 

i — !-+oo y i — >+oo 

since yf* is continuous and strictly monotone increasing on R>o> we obtain that 



T 

therefore we have y^ = yj for all k,j £ {1, . . . , r}. Substituting in Eq. ( [To} and rearranging we obtain 







(16) 



A 2 /? 2 



JJk 



v 2 (l - g) 2 ' 

Noting that from Eq. ( fl"5j ), limsup i _ >+00 n\ < yk, we obtain the desired result. 



We are now in a position to prove theorem 4.2 



Proof of Theorem 4.2 ■ Define C k = limsup^^ E [C k ] ; then we have, by using the upper bound on E [Cf] 
in Eq. d9j, 

r 

C k = lim sup E [C k ] < 

3= 1 - : I 



(3 r— _v-_, r\p 2 grX(3 2 

f~ij3 j ^ \ T\j j 

v 4-~t ~v 2 (\-g) v 2 (l-g) 2 ' 
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Hence, in the limit g — ► 1 , we have 

C k < 



r\(3 2 



X(3? 



TSP, 2 



f f 1/2 (*)dx 



v 2 (1 - g) 2 v 2 (1 - g) 2 

Consider a random demand that arrives in subregion k. Its expected steady-state system time, T , will be upper 
bounded, as g — > 1~, by 



1 



1 



7* < _C" C + -sf < -C* 



1 



A/3 2 



2 ~ 1 2"" " 2 " ' 2i> 2 (l-e) 2 ' 
where we used the fact that, as g — > 1~, the travel time along a TSP tour is negligible compared to the on-site 
service time requirement. Unconditioning, we obtain the claim 



T < 1 



1\ ^A?SP 2 



f Q f 1/2 (x)dx 



2v 2 (1 - gf 



C. Discussion 



The DC policy is optimal in light load (Theorem 4.1 1; moreover, if we let r — ► oo, the DC policy achieves the 
heavy-load lower bound Q for unbiased policies (Theorem 4.2 1. Therefore the DC policy is both optimal in light 
load and arbitrarily close to optimality in heavy load, and stabilizes the system in every load condition (since it 
stabilizes the system in the limit g — > Notice that with r = 10 the DC policy is already guaranteed to be 
within 10% of the optimal (for unbiased policies) performance in heavy load. 

The DC policy does not rely on any parameter that is a function of the problem data: r is simply a design factor 
whose choice is independent of A, s, v and /. However, if r > 1, the computation of a simultaneously equitable 
r-partition of Q does depend on /. Therefore, if r — > +oo, the policy is (i) provably optimal both in light and 
heavy load, and (ii) adaptive with respect to arrival rate A, expected on-site service s, and vehicle's velocity v. If, 
instead, r = 1, the policy is (i) provably optimal in light load and within a factor of 2 of optimal performance 
in heavy load, and (ii) adaptive with respect to arrival rate A, expected on-site service s, vehicle's velocity v, and 
spatial density /; in other words, when r = 1, the DC policy adapts to all problem data. 

An approximate algorithm to compute, with arbitrarily small error, an r-partition that is simultaneously equitable 
with respect to two measures can be found in [14] (specifically, see discussion on page 621); in the particular case 
when the density / is uniform, the problem of finding an r-partition that is simultaneously equitable with respect to 
f(x) and / 1 / 2 (x)/ J Q J 1 / 2 (a;) dx becomes trivial. Given a simultaneously equitable r-partition of Q, the subregions 
can be indexed using the following procedure: (i) a TSP tour through the medians of the subregions is computed, 
(ii) subregions are indexed according to the order induced by this TSP tour. In practice, the DC policy would be 
implemented by allowing the vehicle to skip subregions where no demand is present and by directly moving from 
one TSP to the next one, i.e., by skipping the median of subregion Qk in line [6] (the "shortcuts" were not included 
in the presentation of the DC policy to make its analysis easier). 
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Note that if we assume that / is known (as it must be the case for the implementation of the DC policy with 
r > 1), then instead of using the empirical mean P* one should directly use the exact median P* (which is the 
solution of a simple convex problem). 

Finally, the DC policy with r = 1 is similar to the generation policy presented in [27]; in particular, the generation 
policy has the same performance guarantees of the DC policy with r = 1. However, the generation policy is analyzed 
only for the case of uniform spatial density /, and its implementation does require the knowledge of / (while, as 
discussed before, the DC policy with r = 1 adapts to all problem data, including /). The part of Algorithm [T] in 
lines |5|9] is similar to the PART-TSP policy presented in [26]; however, the PART-TSP policy is only informally 
analyzed by assuming steady-state conditions, in particular no proof of stability and convergence is provided. 

In the next section we present and analyze another single-vehicle policy, the Receding Horizon (RH) policy, that 
applies ideas of receding-horizon control to dynamic vehicle routing problems. 

V. The Single-Vehicle Receding Horizon Policy 

Next, we describe the Receding Horizon (RH) policy. In what follows, D is the set of outstanding demands and 
p is the current position of the vehicle. Moreover, given a tour T of D, a fragment of T is a connected subgraph 
of T . 

Algorithm 2: Receding Horizon (RH) Policy 
input: Scalar rj G (0, 1) 

1 if D = then 

2 Let P£ be the point minimizing the sum of distances to demands serviced in the past; if p ^ P-j\ move to 
P*, otherwise stop. 

3 else 

4 repeat 

s Compute the TSP tour through all demands in D. 

6 Uniformly randomly choose, independently from the past, a fragment of length r\ TSP(D) of such 
TSP tour. 

7 Service the selected fragment starting from the endpoint closest to the current position (if the entire 

tour is selected, start from the closest demand). 

8 until D = 

9 Repeat. 



In other words, if D ^ 0, the vehicle selects a random //-fragment of the TSP tour through the demands in D 
(i.e., a fragment of length 77 TSP(D) of such tour); note that the horizon is not fixed, but it is adjusted according 
to the cost of servicing the outstanding demands. In general, the performance of the system will depend on the 
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choice of the parameter 77, which will be discussed after the analysis of the policy. Notice that the RH policy is an 
unbiased policy. 

A. Stability and Performance of the RH Policy 

The RH policy is attractive since it can be implemented without any knowledge of the underlying demand 
generation process (in particular, without any knowledge of /). On the other hand, an important caveat of receding 
horizon control is that closed-loop stability is not guaranteed in general. In this section, we study the stability of 
the RH policy and we discuss its performance. 

1 ) Stability of the RH Policy: In general, the stability properties of receding horizon control deteriorate as the 
horizon becomes shorter; remarkably, the RH policy is stable in every load for every < 77 < 1. 

Theorem 5.1: As g — > 1~, the system time for the RH policy satisfies 

A/3S,|Q| 

T RH (v) < y2 (i'_ g )2 ' for a11 *> e (°> *)> 
where 0qa is the constant appearing in Eq. Q. 



Proof: By following the same arguments as in Lemma 4.3 it is easy to show that under the RH policy, after 



a transient, the number of demands at the instants when a new TSP tour is computed is very large with high 



probability. Then, in heavy load, D ^ with high probability, and the RH policy reduces to lines 4||8 with the 
condition D = always false. 

We refer to the time instant t.- L , i > 0, in which the vehicle computes a new TSP tour as the epoch i of the policy; 
we refer to the time interval between epoch i and epoch i + 1 as the 7th iteration and we refer to its length as the 
cost Cj. Let rii be the number of outstanding demands at epoch i; with a slight abuse of notation, we denote by 
TSP(rii) the length of the TSP tour through the outstanding demands at epoch i. 



As in the proof of Lemma 4.4 the heavy-load assumption allows us to neglect the time to travel from the end of 
one TSP tour to the start of next one. Since an 77-fragment is randomly chosen, the expected number of demands 
left unserviced during iteration 7 is (1 — 77) E [iii); then, 

E[n i+ i]= AE[Ci] + (l-ry)EH 



newly arrived demands demands left unserviced during iteration i 

The expected number of demands that receive service during iteration i is 77 E [m] . Then, the expected time-length 
of iteration i is given by 

e [d] = s V e [m] + - e [TSP( ni )] <s V e [m] + 77 (3 Q , 2 V\Q\ ^eR, (17) 

V V 

where we have applied the deterministic bound in Eq. |2]i and Jensen's inequality for concave functions. Therefore, 
we obtain 



77 

C [TljJ -I- A 

It is then straightforward to show that 



E[n t+1 ] < qt)E [m] + A ^ (3 Q>2 v^QI ¥i] + (1 - v) E \ n i 



A 2 /3| 2 |Q| 

n = hmsup E [m] < ' — -r, for all 77 E (0, 1). 

i^oc V z (1 - QY 
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From Little's law, n — A Trh, and thus 



Trh < -irjf Y2 for a11 V G (°> 1 ■ 

tr (1 — 



Since, by Theorem 5.1 the RH policy stabilizes the system in the limit g — > 1~, it also stabilizes the system for 
any load g G [0, 1). 

2) Performance of the RH Policy: In light load, the RH policy becomes identical to the DC policy, and therefore 
it is optimal (see Theorem |4.1| i. 

For the case g — > 1~, because of the dependencies among demands' locations, we were unable to obtain upper 
bounds on the system time that are both tight and rigorous. However, we now present an heuristic analysis of the 
RH policy that provides interesting insights on its behavior and suggests bounds on its heavy-load system time that 
are well matched by simulation results. 

The service of an ^-fragment introduces dependencies among the locations of the demands that are left unserviced; 
therefore, even though the number of demands, as g — > 1~, becomes large with probability 1, the result in Eq. ([T]) 
(that requires a set of independently distributed demands) formally does not hold. However, due to the randomized 
selection of the 77-fragments, it is quite natural to conjecture that Eq. ([TJ still provides a good estimate on the 
lengths of the TSP tours that are computed. The rationale behind this conjecture is the following: When r\ is close 
to 1, most of the outstanding demands are serviced during one iteration, and the next TSP tour is mostly through 
newly arrived demands, that are independent by the assumptions on the demand generation process; when, instead, 
r\ is close to zero, the randomized selection of short fragments only introduce "negligible" correlations. 

The above observations motivate us to use /3tsp,2 J q f 1 ' 2 ^) dx (as it would be dictated by Eq. ([T| ) in Eq. (17) , 
instead of Pq.2\/\Q\ (as it is dictated by Eq. Q). Then, by repeating the same steps as in the proof of Theorem 



5.1 we would obtain the following result 



xa 



TSP, 2 



f f^{x)dx 



T m (v) < v L 2 ( j _ g)2 — , for all r, e (0, 1). (18) 

The upper bound in (jT8j is tighter than the upper bound in Theorem |5.1| (notice that by Jensen's inequality for 

2 



convex functions 



Jo f 1/2 (x) dx < J Q {f 1,2 ) 2 {x) dx = 1), but unlike the upper bound in Theorem 



5.1 



it has 



not been established formally. Simulation results (see Section VII 1 indeed confirm the upper bound in Eq. ( [18) . 

Finally, note that when 77 is close to zero, the RH policy is conceptually similar to the DC policy with r — > 00, 
so we also speculate that in heavy load the performance of the RH policy improves as the horizon becomes shorter. 
Simulation results (see Section |VII[i indeed confirm this behavior. 



B. Discussion 

The RH policy is optimal in light load and stabilizes the system in every load condition (these two statements 
have been established rigorously). The implementation of the RH policy does not require the knowledge of A, s, v 
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and /. Therefore, the RH policy adapts to arrival rate A, expected on-site service s, vehicle's velocity v, and spatial 
density /; in other words, the RH policy adapts to all problem data. 

In Section [TV] we saw that also the DC policy with r = 1 adapts to all problem data. Simulation results (see 



Section VII I show that in heavy load the RH policy with a short horizon 77 (say, 77 w 0.2) performs better than the 
DC policy with r = 1 (the light-load behavior is clearly the same). Therefore, to date, the best available policy for 
the 1-DTRP that is unbiased and adapts to all problem data seems to be the RH policy with 77 as 0.2. 

It is natural to wonder how the RH policy performs if the 77-fragment is not selected randomly, but it is instead 
selected according to the rule: Find the fragment of length r)TSP(D) that maximizes the number of reached 



demands. In other words, the vehicle now looks for a maximum-reward 77-fragment. Clearly, Theorem 5.1 still 
applies. Simulation results (see [32]) show that this variant of the RH policy appears to behave neither like a pure 
unbiased policy nor like an optimal biased policy; rather, its performance seems to lie between these two extremes. 

VI. Adaptive and Distributed Policies for the ?n-DTRP 

In this section we extend the previous single-vehicle policies to the multi-vehicle case. First, we prove that in 
heavy load, given a single -vehicle, unbiased optimal policy tt* , there exists an unbiased tt* -partitioning policy that 
is optimal. 

This result and the optimality of the SQM policy (which is a partitioning policy that is optimal in light load) 
lead us to propose, for the multi-vehicle case, partitioning policies that use the DC policy or the RH policy as 
single-vehicle policies. Finally, we discuss how a partition can be computed in a distributed fashion, so that the 
proposed policies are amenable to distributed implementation. 



A. Optimality of Partitioning Policies in Heavy Load 
The following theorem holds. 

Theorem 6.1: In heavy load, given a single-vehicle, unbiased optimal policy tt* and m vehicles, there exists an 
unbiased tt* -partitioning policy that is optimal for the m-DTRP. 

Proof: Let tt* be an optimal unbiased policy (e.g., the DC policy with r — > 00). Construct an m-partition 
°f Q tnat i s simultaneously equitable with respect to f[x) and f 1 ^ 2 (x)/ j Q / 1 / 2 (x) dx (such partition is 
guaranteed to exist as discussed in Section assign one vehicle to each subregion. Each vehicle executes the 
single-vehicle unbiased policy tt* to service demands that fall within its own subregion. The probability that a 
demand falls in subregion Qj is equal to J Q f(x)dx = 1/m. Notice that the arrival rate \ in subregion Qi is 
reduced by a factor J Q f(x) dx, i.e., A; = A J Q f(x) dx — A/m; therefore, the load factor in subregion Qi (where 
only one vehicle provides service) is Qi — XiS = Xs/m = q < 1, in particular the necessary condition for stability 
is satisfied in every subregion. Finally, given that a demand falls in subregion Qi, the conditional density for its 
location (whose support is Qi) is f(x)/ Jq f(x) dx — m f(x). Clearly, such multi-vehicle policy is still unbiased. 
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Then, by the law of total probability we have 




(3? 



TSP,2 



A 



fof 1/2 (x)dx 



2 m 2 v 2 (1 — g) 2 
Thus, the bound in Q is achieved. 



Remark 6.2: Theorem 6.1 proves Conjecture 1 (made for uniform spatial density /) in [12]. 

Remark 6.3: A similar result can be proven for biased policies; in this case the m-partition should be simulta- 
neously equitable with respect to f(x) and / 2 ^ 3 / Jq f 2 ^ 3 (x) dx). 

Remark 6.4: Interestingly the proof of Theorem |6.1| uses, for unbiased policies, m-partitions that are simulta- 
neously equitable with respecj^] to / and f 1 ^ 2 . We next show that a n* -partitioning policy that uses a partition 
equitable with respect to f 1 / 2 might be unstable, while a 7r* -partitioning policy that uses a partition equitable with 
respect to / is in general not optimal but has a constant factor guarantee of m. 

Theorem 6.5: In heavy load, given a single-vehicle, unbiased optimal policy tt* , a tt* -partitioning policy that 
uses an m-partition that is equitable with respect to J 1 / 2 may be unstable. 

Proof: The arrival rate in subregion Qi is A, = A J Q f(x) dx, and therefore the load factor in subregion Qi 
(where only one vehicle provides service) is gi = A J Q f(x) dxs. In general, an m-partition that is equitable with 
respect to f 1 / 2 is not equitable with respect to / (unless the density is uniform); if this is the case, let Qj be the 
subregion such that J Q _ f(x) dxs = 1/m + e, for some e > 0; then, qj — g + eXs. Therefore, as g — > 1~, demands 
in subregion Qj will experience unbounded system times (for values of g strictly less than one). ■ 

Theorem 6.6: In heavy load, given a single-vehicle, unbiased optimal policy 7r*, a 7r* -partitioning policy that 
uses an m-partition that is equitable with respect to / is within a factor m of the optimal, and in general is not 
optimal. 

Proof: Let it* be an optimal unbiased policy. The environment Q in partitioned into m subregions Qi, i = 
1, . . . , m, such that J Q f(x) dx = j Q f(x) dx/m = l/m for all i, and one vehicle is assigned to each subregion. 
Each vehicle executes the single-vehicle policy it* to service demands that fall within its own subregion. Clearly, 



such multi-vehicle policy is unbiased. As in the proof of Theorem 6.1 we have, by the law of total probability 

2 . 



T 




(19) 



check 
equality 



With a slight abuse of notation we are omitting the normalization constant. 
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Define the vector 

x = ( [ f 1/2 (x) dx, f f^ 2 (x) dx,..., [ f/ 2 (x) dx) ; 
\jQi JQi JQ m J 

we will denote by ||a;|| 2 and \\x\\i the 2-norm and the 1-norm, respectively, of vector x. Notice that ||a;||i 
Jq f 1 ' 2 (x) dx. Since for any vector x 6 R m it holds ||x||i < \/m||a;|j2, Eq. < [T9| > becomes 

1 ^TSP,2 A .. .-I /?| S P,2 A \\x\\l 



T = — ' — — \\x\\ > — 

m 2 v 2 (1 — g) 2 ~ rn 2 v 2 (1 — g) 2 m 

2 



;2 A 

TSP,2 



f a f l '*{x)dx 



— T* as q -► 1" 



2 m 2 v 2 (1 — g) 2 

In general, an m-partition that is equitable with respect to / is not equitable with respect to f 1 ^ 2 ; therefore, the 
inequality ||a;||i < v^ll^lh ls > m general, strict and it can happen that T > T* , i.e., that such partitioning policy 
is not optimal. 

On the other hand, for any vector x € M m it also holds ||a;||2 < H^llli therefore Eq. ( [T9] l becomes 

1 f3 TS p 2 A 2 ^ 1 ^TSP,2 A 2 



/J 



2 A 
TSP,2 



v 2 (l-g) 2 "*""* " m 2 w 2 (l-e) 5 

2 



J a fV*{x)dx 



7TL V 2 (1 — g) 2 



m T* as g — > 1 



Remark 6.7: A similar result can be proven for biased policies: in this case the 3-norm is the relevant norm to 
use. 

B. Distributed Policies for the m-DTRP and Discussion 



The optimality of the SQM policy and Theorem 6.1 suggest the following distributed multi-vehicle version of 
the DC policy: 

1) the vehicles compute in a distributed way an m-median of Q that induces a Voronoi partition that is equitable 
with respect to / and J 1 / 2 , 

2) the vehicles assign themselves to the subregions (so that there is a bijective correspondence between vehicles 
and subregions), 

3) each vehicle executes the single-vehicle DC policy to service demands that fall within its own subregion, by 
using the median of the subregion instead of Pj\ 

(In an identical way, we can obtain a multi -vehicle version of the RH policy.) 

If, given Q and /, an rri-median of Q that induces a Voronoi partition that is equitable with respect to / and 
J 1 / 2 exists, then the above policy is optimal both in light load and arbitrarily close to optimality in heavy load, and 
stabilizes the system in every load condition. There are two main issues with the above policy, namely (i) existence 
of an m-median of Q that induces a Voronoi partition that is equitable with respect to / and J 1 / 2 , and (ii) how to 
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compute it in a distributed way. In [33], we showed that for some choices of Q and / a median Voronoi diagram 
that is equitable with respect to / and J 1 / 2 fails to exist. On the other hand, in [34], we presented a synchronous 
distributed partitioning algorithm that, for any possible choice of Q and /, provides a partition that is equitable 
with respect to / and represents a "good" approximation of a median Voronoi diagram (see [34] for details on 
the metrics that we use to judge "closeness" to median Voronoi diagrams). Moreover, if an m-median of Q that 
induces a Voronoi partition that is equitable with respect to / exists, the algorithm will locally converge to it. 
Accordingly, we define the multi-vehicle Divide & Conquer policy as follows 

Algorithm 3: Multi- Vehicle DC (m-DC) Policy 

1 The vehicles run the distributed partitioning algorithm in [34]. 

2 The vehicles assign themselves to the subregions (this part is indeed a by-product of the distributed algorithm 
[34]). 

3 Each vehicle executes the single-vehicle DC policy inside its own subregion. 



According to Theorem 6.6 the m-DC policy is within m of optimal in heavy load (since the algorithm in [34] 
always provides a partition that is equitable with respect to /), and stabilizes the system in every load condition. 
In general, the m-DC policy is only suboptimal in light load; note, however, that the computation of the global 
minimum of the Weber function H m (which is non-convex for m > 1) is difficult for m > 1 (it is NP-hard for 
the discrete version of the problem); therefore, for m > 1, suboptimality has to be expected from any practical 
implementation of the SQM policy. If an m-median of Q that induces a Voronoi partition that is equitable with 
respect to / exists, the m-DC will locally converge to it, thus we say that the m-DC policy is "locally" optimal in 
light load. 

Note that, when the density is uniform, a partition that is equitable with respect to / is also equitable with respect 
to J 1 / 2 ; therefore, when the density is uniform the m-DC policy is arbitrarily close to optimality in heavy load 
(see Theorem |6.1| l. 

The m-DC policy adapts to arrival rate A, expected on-site service s, and vehicle's velocity v; however, it requires 
the knowledge of /. The key feature of the partitioning algorithm in [34] is that it is spatially-distributed (see, e.g., 
[35] for a rigorous definition of spatially-ditributed algorithms); therefore, the m-DC policy is indeed a distributed 
control policy. 

VII. Simulation Experiments 

In this section we discuss, through the use of simulations, the heavy-load behavior of the DC policy and the RH 
policy, and their relative performance. 

All simulations are performed in Matlab®7.4, with external calls to the program linkern. In all simulations 
we consider a circular environment Q with area equal to 1, a vehicle's velocity v = 1, and an on-site service time 
uniformly distributed in the interval [0, 1] (thus s = 0.5). In each simulation run we iterate the policy (either DC or 
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RH) until a steady-state is reached; we use the following method to declare that the system is in steady-state: we 
filter the sequence of iterations' time-lengths by using a moving-average filter with window size 15, and we stop 
as soon as the linear least-square fit of the previous 300 iterations' time-lengths yields a slope with magnitude less 
than 0.1. On average, when using as initial condition n = (g/s) Tpc (respectively, n = (g/s) TrhX a steady-state 
(according to the previous criterion) is reached after ~ 1500 iterations; clearly, larger values of g imply a larger 
relaxation time. The spatial density / : Q — > R + used in the simulation experiments is illustrated in Fig. [4] The 



Fig. 1. Geometry of spatial density /. 

regions Qi and Q2 have areas e and 1 — e respectively. Within each region demands are uniformly distributed. 
Demands fall in region Qi with probability 1 — 5 and in region Q2 with probability 5. Thus, the density is piecewise 
uniform with: 



If 1 — 5 = e, then the density / is uniform; if, instead, 1 — 5 > e, then the density / has a peak in subregion Q\. 
In all simulation experiments, we set e = 0.1. 

A. Heavy-Load Performance of the DC Policy 

We first consider a uniform spatial density, i.e., 5 — 1 — e = 0.9. We consider 5 values of the load factor, namely 
q = 0.9,0.93,0.95,0.97,0.99. For each value of g, we perform 50 runs by executing the DC policy with r = 1. 
In each run we iterate the policy until a steady-state is reached (in the sense described before), and we compute 
the average system time by considering the last 300 iterations; this value is then averaged over the 50 runs, and is 
finally divided by T* (whose expression is given in Eq. Q). Red circles in the left-hand figure of Fig. [^represent 
such ratios for the different values of g. We then repeat the simulation process by executing the DC policy with 
r = 16. Black squares in the left-hand figure of Fig. [2] represent the ratios between the experimental system time 
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Fig. 2. Left Figure: Ratio between experimental system time under the DC policy and T* in the case of uniform density (i.e., 8 = 0.9). 
Right Figure: Ratio between experimental system time under the DC policy and f * in the case of non-uniform density (S = 0.6). Red circles 
correspond to the DC policy with r = 1, while black squares correspond to the DC policy with r = 16. 



and T* for the different values of g. It can be observed that for r = 1 the ratios Tuc IT* decrease as g approaches 



one and tend to 2, in accordance with the upper bound in Theorem 4.2 (in fact, 1 + 1/r = 2 in this case). Similarly, 
for r = 16 the ratios Tuc/T* decrease as g approaches one and tend to (1 + 1/16) « 1.06, in accordance again 



with the upper bound in Theorem 4.2 These results confirm the theoretical analysis in Section IV and show that 
the upper bound |7]i is indeed tight for the various values of r. Note that for all values of g the experimental system 
time is larger than the upper bound; this is due to one or a combination of the following reasons: First, the upper 
bound formally holds only in the limit g — > 1~. Second, we are using an approximate solution for the optimal TSP. 

The same set of simulations is performed for a non-uniform spatial density; in particular we consider <5 = 0.6, 
i.e., a peak in the small subregion Qi. Note that in this case a simultaneously equitable partition can be trivially 
obtained by using radial cuts. Results are shown in the right-hand figure of Fig. [2] Again, red circles represent the 
ratios between the experimental system time and T* for r = 1, while black squares represent the ratios between 
the experimental system time and T* for r = 16. As before, the results are in accordance with the the upper bound 
in Theorem 14.21 

Finally, from Fig. [2] it can be observed that as g — > 1~ the DC policy with r = 16 performs better than the DC 



policy with r = 1, precisely by a factor of 2 in accordance with the upper bound in Theorem 4.2 On the other 
hand, for moderate values of g, say g « 0.9, the DC policy with r = 1 performs better than the DC policy with 
r = 16. This result is expected since for moderate values of g and "large" values of r the number of demands in 
each subregion is low, and therefore Eq. ([TJ is no longer applicable (while it is still applicable for r = 1). 



B. Heavy-Load Performance of the RH Policy 

We first consider a uniform spatial density. We consider 3 values of the load factor, namely g = 0.95, 0.97, 0.99, 
and 7 values for 77, namely 77 = 0.1,0.2,0.3,0.7,0.8,0.9,1. Formally, the RH policy is not defined for 77 = 1, 
however, by setting r\ = 1 in the definition of the RH policy, we simply obtain the DC policy with r = 1. For each 
pair (g, 77), we perform, as before, 50 runs by executing the RH policy (with the corresponding rf). In each run we 
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Fig. 3. Left Figure: Ratio between experimental system time under the RH policy and T* in the case of uniform density (i.e., 6 = 0.9). Right 
Figure: Ratio between experimental system time under the RH policy and T* in the case of non-uniform density (8 = 0.6). 



iterate the policy until a steady-state is reached, and we compute the average system time by considering the last 
300 iterations; this value is then averaged over the 50 runs, and is finally divided by T* . Results are shown in the 
left-hand figure of Fig. [3] The same set of simulations is performed for a non-uniform spatial density; in particular 
we consider S — 0.6. Results are shown in the right-hand figure of Fig. [3] 

The results confirm that the system time Trh follows the A|Q|/(1 — g) 2 growth predicted by Theorem |5.l| In 
Section [v] we argued that in heavy load T RH (v) < 2T* (see Eq. ( fT8l >) for all 77 S (0, 1); from Fig. [5] it can 
be observed that as g approaches one this asymptotic bound seems to be confirmed, both for a uniform / and a 
non-uniform /. Finally, we also argued in Section [V] that the performance of the RH policy should improve as the 
horizon becomes shorter: this is indeed the case, in particular simulation results show that optimal performance is 
achieved for rj « 0.2. 

C. Comparison between DC policy and RH policy 

The RH policy should be compared with the version of the DC policy that is also adaptive with respect to /, 
i.e., with the DC policy with r = 1. Indeed, as observed before, the DC policy with r = 1 is the same as the RH 
policy when we set 77 = 1. Therefore, we can compare the two policies by looking at Fig. [3] One should note that 
the RH policy with 77 w 0.2 performs consistently better than the DC policy with r = 1, in particular it decreases 
the system time by a factor w 1.25. 

D. Execution of the Multi-Vehicle DC Policy 

The m-DC policy adopts a partitioning scheme to distribute the workload among the agents; we refer the interested 
reader to [34] for a discussion about the performance of the distributed partitioning algorithm that the m-DC policy 
uses. Here, we only show a snapshot of a simulation experiment with 8 vehicles; the vehicles operate in the unit 
square with a uniform density /, and execute the m-DC policy with r = 1. The load factor is g = 0.8; 
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Fig. 4. Example of execution of the m-DC policy, m = 8 , Q = 0.8, <5 = 0.9. 

TABLE I 

Unbiased Policies for the 1-DTRP 



Properties 


DC Policy, r — > oo 


DC Policy, r = 1 


RH Policy 


UTSP Policy, r -» oo 


Light-load performance 


optimal 


optimal 


optimal 


not optimal 


Heavy-load performance 


optimal 


within 100 % of optimum 


within 100% of optimum* 


optimal 


Adaptive to A, 5, and v 


yes 


yes 


yes 


no 


Adaptive to / 


no 


yes 


yes 


no 



TABLE II 

Unbiased Policies for the m-DTRP 



Properties 


DC Policy, r — > oo 


UTSP Policy, r -> oo 


Light-load performance 


"locally" optimal 


not optimal 


Heavy-load performance 


optimal for uniform /, within m of optimal in general 


optimal 


Adaptive to A, s, and v 


yes 


no 


Adaptive to / 


no 


no 


Distributed 


yes 


no 



VIII. Conclusion 

The DTRP is a widely applicable model for dynamic task-assignment problems in robotic newtorks. In this 
paper, our objective was to design unbiased policies for the DTRP that (i) are adaptive (in particular do not require 
the knowledge of the arrival rate A and the statistics of the on-site service time), (ii) enjoy provable performance 
guarantees (in particular, are provably stable under any load condition), and (iii) are distributed. To this end, we 
introduced two new policies for the 1-DTRP, namely the DC policy and the RH policy, that were subsequently 
extended to the multi-vehicle case through the idea of partitioning policies. 

Tables |I| and |H| provide a synoptic view of the results we achieved; in particular, our policies are compared with 
the best unbiased policy available in the literature, i.e., the UTSP policy with r — > oo. In Table [I] an asterisk * 
signals that the result is heuristic. 
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This paper leaves numerous important extensions open for further research. First, although unbiased service 
seems to be a natural constraint in most applications, in some cases this constraint might be relaxed and it is then 
of interest to design optimal or near-optimal biased policies that are adaptive and distributed (recall that allowing 
biased service may result in a strict reduction of the optimal mean system time for any non-uniform density /). 
Second, finding an unbiased policy for the 1-DTRP that is optimal in heavy load and does not rely on the knowledge 
of / is both practically important and theoretically interesting. Third, the m-DC policy requires the knowledge of / 
for the partition of the environment among the vehicles: we plan to design asynchronous distributed algorithms for 
equitable environment partitioning that do not rely on the knowledge of /. Finally, in our analysis, we considered 
omni-directional vehicles with first order dynamics: non-holonomic constraints will have to be taken into account 
for practical application to UAVs or other systems. 
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